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Public data is important for genetic studies 
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To make this endeavor sustainable, we must proactively map risks 
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Co-segregation between Y-chr and surnames 
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RESEARCH TOOLS STATISTICS 


A Free Public Service from Family Tree DNA 


Need Help? Forgot Password? Discla 


Displaying User ID: CEEPG 

Search bv Last Name > Search bv Last Name Results > Last Names Matching "erlich" > Displaying User 
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Exploiting genetic genealogy databases 


She toashiiuiton tJost 

Found on the Web, With DNA: a Boy's Father 

By Rob Stein 

Washington Post Staff Writer 
Sunday. November 13. 2005 

Like many children whose mothers used an anonymous 
sperm donor, the 15-year-old boy longed for any shred 
of information about his biological father. But. 
uniquely, this resourceful teenager decided to try 
exploiting the latest in genetic technology 7 and the 
sleuthing powers of the Internet in his quest. 

By submitting a DNA sample to a commercial genetic 
database service designed to help people draw their 
family tree, the youth found a crucial clue that quickly 
enabled him to track down his long-sought parent. 

"I was stunned." said Wendy Kramer, whose online 
registry 7 for children trying to find anonymous donors 
of sperm or egg helped lead the teenager to his father. 

"This had never been done before. No one knew you 
could get a DNA test and find your donor." 

While welcomed by advocates of children trying to locate anonymous donors, the case — 
apparently the first of its kind — has raised alarm among sperm banks and some medical 
ethicists. They are concerned it might start a trend that could \iolate the privacy of 
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An anecdote? 
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The main idea - a systematic study 


Can we recover the identity of anonymous 
sequencing datasets using public resources? 
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Empirical test: what is the probability to recover 
a surname? 
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900 


Comparing the predicted 
surname to the true one 


Expectation for US Caucasian males from middle and upper class: 

12% Successful recoveries 
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The Venter case 
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Can we identify anonymous 
personal genomes? 
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Recovering the identifies of CEU individuals 


1000 Genomes 

1 — 

A Deep Catalog of Human Genetic Variation 
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110 CEU genomes 
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8 Surname 
predictions with 
Utah ancestry 


Google 


Winfield Utah 


Found an obituary that has the exact description of the 
pedigree 


Coriell Institute 

FOR MEDICAL RESEARCH 


Probability of a random match < 5x10 -9 
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Beginner’s luck? 



■ Successful surname recovery (targeted individual) 

/ Person tested by genetic genealogy service (source) 
— Patrilineal line from source to target 


Breaching the privacy of close to 50 CEU samples. 
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Summary 


Our approach: 

- No experimental work involved. 

- The identifying information propagates via deep genealogical ties. 

- The attack completely relies on public resources. 

Testing close to 1000 Y-STR haplotypes, 

demonstrating complete identification of Venter and close to 

50 CEU individuals. 
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IMHO, recommendations 


1. Consent: 

- Be honest about risks. Be honest about benefits. 


2. Multi-tier approach: 

- Give participants options for data sharing. 

3. Proactive approach: 

- Keep mapping risks. Friendly hacking is far better 
than a real one. 


4. Technical solutions: 

- We did not explore those enough. Much more to do 
here. 
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Identifying Personal Genomes by 
Surname Inference 
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Sharing sequencing data sets without identifiers has become a common practice in genomics. 
Here, we report that surnames can be recovered from personal genomes by profiling short tandem 
repeats on the Y chromosome (Y-STRs) and querying recreational genetic genealogy databases. 
We show that a combination of a surname with other types of metadata, such as age and state, 
can be used to triangulate the identity of the target. A key feature of this technique is that it entirely 
relies on free, publicly accessible Internet resources. We quantitatively analyze the probability of 
identification for U.S. males. We further demonstrate the feasibility of this technique by tracing back 
with high probability the identities of multiple participants in public sequencing projects. 
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